• Tempo of Gene Expression
  • Build GCN using timecourseRnaseq
  • Contact

On this page

  • (INT) Create databases
  • ONE-SHOT (make_modules)
  • STEP-BY-STEP
  • (PREP) Log-transform data
  • (SUBSET) Filter data
  • (WGCNA) Format data
    • QC
  • Calculate gene-gene similarity
    • QC
  • Create adjacency matrix
    • USER INPUT REQUIRED —-
  • Identify gene clusters
    • 2.1 Create topological overalp matrix
    • 2.2 Identify clusters
    • 2.3 Merge similar modules
    • 2.4 Calculate module-module similarity
    • 2.5 Visualize the network
  • Save module identity
  • Identify rhythmic modules
    • Comparison
  • Network statistics
    • Intramodular connectivity

Build gene co-expression network (GCN) from time-course gene expression data


Last updated on 2025-09-01.


Show code
library(dplyr)
library(dbplyr)
library(ggplot2)
for (i in list.files(here::here("R"), full.names = TRUE)) {
  source(i)
}

# SAMPLE NAME
## specify sample name
sample.names <- c(
  # dmel
  "dmel-head",
  # mmus
  "mmus-brain_stem", 
  # panu
  "panu-hypothalamus"
)
# sample.cycles <- c("LD", "DD")

## SPECIFY THE DATASET TO BUILD GCN WITH
which.sample <- sample.names[2]

writeLines(
  glue::glue("Sample: {which.sample}")
)
Sample: mmus-brain_stem

(INT) Create databases

Show code
data.db <- load_data(
  sample_names = sample.names
)

cat("Structure of input data:")
Structure of input data:
Show code
data.db[[which.sample]] |> 
  head(5) |> 
  mutate_if(
    is.numeric,
    ~ round(.x, 1)
  ) |> 
  DT::datatable(
    caption = "Column 1: Name of feature (gene), Columns 2-N: Timepoints, Value: Expression / Activity",
    rownames = FALSE
  )
Show code
# cat("Here are the tables, in each database.")
# purrr::map(
#   data.db,
#   ~ src_dbi(.x)
# )

ONE-SHOT (make_modules)

Show code
# data.db[[1]] |> 
#   glimpse() |> 
#   make_modules(
#     min_expression = 1,
#     min_timepoints = 4,
#   ) |> 
#   str(max.level = 1)

STEP-BY-STEP

(PREP) Log-transform data

Show code
# Define the function
dat <- log2_transform_data(
  data = data.db[[which.sample]],
  id_column = "gene_name"
)
Applying log2-transformation...Done.

(SUBSET) Filter data

Show code
# Which genes are expressed throughout the day in forager heads?
  # count the number of time points that has ≥ 1 FPKM
  # subset the data and only keep the filtered genes
  
dat_sub <- dat |> 
  subset_data(
    min_expression = 10,
    min_timepoints = 8,
    id_column = "gene_name"
  )
Subsetting data...Done.
[ NOTE ]: After subsetting, 3345 of 17406 rows remain.
Show code
writeLines("Dimensions of the data post-filtering step [Rows = #genes, Cols = #samples]")
Dimensions of the data post-filtering step [Rows = #genes, Cols = #samples]
Show code
dim(dat[-1])
[1] 17406     8

This is our cleaned, input data file for building the circadian GCN.

(WGCNA) Format data

  • transpose the dataset: rows = timepoints, cols = genes
Show code
datExpr <- dat_sub %>% 
  transpose_data() 

QC

Show code
datExpr %>%
  check_sample_quality()
 Flagging genes and samples with too many missing values...
  ..step 1
All okay!
Show code
datExpr %>% 
  plot_sample_expression()
Visualizing the log-transformed data

Calculate gene-gene similarity

Show code
# Calculate Kendall's tau-b correlation for each gene-gene pair
sim_matrix <- calculate_gene_gene_sim(
  data = datExpr,
  name = which.sample,
  cache = FALSE
)
Running gene-gene similarity...Done.

QC

Show code
cat("Before power transformation:")
Before power transformation:
Show code
plot_sim_matrix(
  matrix = sim_matrix
)
Plotting a chunk of the gene-gene similarity matrix with 200 genes.

Create adjacency matrix

USER INPUT REQUIRED —-

To create the adjacency matrix, we need to first identify the soft-thresholding power (see WGCNA for more info).

Show code
sft <- analyze_network_topology(
  data = datExpr
)
Performing network topology analysis to pick 
  soft-thresholding power...
   Power SFT.R.sq   slope truncated.R.sq mean.k. median.k. max.k.
1      1   0.7170  1.8500          0.644    1840    2050.0   2390
2      2   0.4340  0.5310          0.640    1280    1460.0   1970
3      3   0.0697  0.1310          0.908     981    1100.0   1710
4      4   0.0147 -0.0516          0.932     789     860.0   1520
5      5   0.1640 -0.1660          0.871     655     686.0   1370
6      6   0.3510 -0.2500          0.841     556     557.0   1250
7      7   0.5100 -0.3140          0.824     480     458.0   1150
8      8   0.6400 -0.3720          0.839     420     379.0   1070
9      9   0.7400 -0.4140          0.857     372     319.0    995
10    10   0.7990 -0.4530          0.859     331     270.0    931
11    12   0.8650 -0.5220          0.861     269     198.0    825
12    15   0.8890 -0.6030          0.862     205     129.0    703
13    18   0.8870 -0.6720          0.857     162      87.2    610
14    21   0.9020 -0.7200          0.883     131      61.4    537

Plotting resutls from the network topology analysis...

Done.
[ NOTE, FIGURE ]: Red horizontal line indicates a signed R^2 of 0.9

NOTE: The scale-free topology fit index reaches close to 1 (red horizontal line) at a soft-thresholding-power of 12.

Now, we can go ahead and create our adjacency matrix by power-transforming the similarity matrix (see WGCNA for more info).

Show code
## Specify the soft-thresholding-power
soft.power = sft$powerEstimate

# Construct adjacency matrix
# TO DO: make function ----
adj_matrix <- WGCNA::adjacency.fromSimilarity(
  sim_matrix,
  power = soft.power,
  type = "signed"
) |> 
  as.matrix()

cat("After power transformation:")
After power transformation:
Show code
plot_sim_matrix(
  matrix = adj_matrix
)
Plotting a chunk of the gene-gene similarity matrix with 200 genes.

Identify gene clusters

The following steps are performed as per guidelines from the WGCNA package and several tutorials made available online.

2.1 Create topological overalp matrix

Show code
# Turn adjacency into topological overlap
dissTOM = 1 - WGCNA::TOMsimilarity(adj_matrix)
..connectivity..
..matrix multiplication (system BLAS)..
..normalization..
..done.
Show code
# Call the hierarchical clustering function
geneTree = perform_hclust(
  data = dissTOM
)

2.2 Identify clusters

User defined parameters:

  • minimum size (number of genes) of modules

We like large modules, so we set the minimum module size relatively high (50).

Show code
minModuleSize = 50
# Module identification using dynamic tree cut:
modules <- create_modules(
  tree = geneTree,
  dissTOM = dissTOM,
  data = datExpr,
  min_module_size = minModuleSize
)

Merging modules that have a correlation ≥ 0.9 ...Done.
[ NOTE, FIGURE ] Plotting identified clusters before and after merging.

Module (cluster) size:
mergedColors
        black          blue          cyan     darkgreen      darkgrey 
          153           450           195            85            67 
   darkorange       darkred darkturquoise         green   greenyellow 
          164            87            69          1135           111 
       grey60    lightgreen       magenta        orange           red 
          196            95           123            56           159 
    royalblue           tan 
           90           110 

2.3 Merge similar modules

User defined parameters:

  • minimum correlation between two modules above which they are merged into one | var-name: MEDissThres
Show code
merge = create_modules(
  tree = geneTree,
  dissTOM = dissTOM,
  min_module_size = minModuleSize,
  merge_cutoff_similarity = 0.8, # 70 % similarity
  data = datExpr
)

Merging modules that have a correlation ≥ 0.8 ...Done.
[ NOTE, FIGURE ] Plotting identified clusters before and after merging.

Module (cluster) size:
mergedColors
        black      darkgrey darkturquoise    lightgreen       magenta 
          153            67            69          1622           123 
       orange           red 
          146          1165 

2.4 Calculate module-module similarity

Show code
adj_matrix_ME <- calculate_module_module_sim(
  merged_modules = merge[["modules"]]
)
Calculating module-module similarity based 
  on module-eigengene-expression...Done.
Tidying module names...Done.
Plotting adjacency matrix for module-module similarity...

Done.

2.5 Visualize the network

Show code
plot_adj_as_network(
  layout = igraph::layout_as_tree,
  matrix = adj_matrix_ME$ME,
  min_edge = 0.5,
  node_label_size = 2,
  node_size = 45,
  edge_size = 5,
  node_frame_col = "grey20",
  node_fill_col = "grey80",
  vertex.frame.width = 4
)
Visualizing a simplified representation of the circadian GCN

Save module identity

Obtain a list of genes in each module

Show code
module_genes <- tidy_modules(
  merged_modules = merge[["colors"]],
  mapping_tbl = adj_matrix_ME$mapping_tbl,
  data = datExpr
)

# saveRDS(
#   module_genes,
#   here::here(
#     glue::glue(
#       "./data/tmp/{which.sample}_module_genes.RDS"
#     )
#   )
# )

# TO DO:
# Save the `dat_module_gene` as a database table in the respective database.
Compare GCN vs. ants
Show code
genes_across_species <- module_genes |> 
  convert_id(
    from = "mouse",
    to = c("cflo", "drosophila")
  )

genes_across_species |> 
  group_by(gene_name, module_identity) |> 
  reframe(
    dmel = length(
      unique(
        na.omit(drosophila_ID)
      )
    ),
    cflo = length(
      unique(
        na.omit(cflo_ID)
      )
    )
  ) |> 
  arrange(
    module_identity,
    desc(dmel),
    desc(cflo)
  ) |> 
  mutate(
    across(
      c(dmel, cflo),
      ~ case_when(
        .x > 1 ~ "multi",
        .x == 1 ~ "single",
        .x == 0 ~ "x",
        .default = NA
      ) |> 
        factor(
          levels = c("multi", "single", "x")
        )
    )
  ) |> 
  group_by(module_identity, dmel, cflo) |> 
  tally() |> 
  arrange(module_identity, desc(n)) |> 
  ungroup() |> 
  DT::datatable(
    rownames = FALSE
  )

Identify rhythmic modules

Show code
db_rhy <- load_rhy_genes(
  sample = which.sample
)
###-###-###-###-###-###-###-###-
# Set your p-value of choice
# col_pval = "BH.Q"
col_pval = "default.pvalue"
# col_pval = "raw.pvalue"
###-###-###-###-###-###-###-###-
l_module_genes <- module_genes |> 
  arrange(module_identity) |> 
  group_split(module_identity) |> 
  purrr::map(
    ~ .x |> pull(gene_name)
  ) |> 
  setNames(unique(module_genes[["module_identity"]]))

l_rhy_genes <- db_rhy |> 
  purrr::map(
    ~ .x |> 
      filter(
        if_all(
          c(col_pval),
          ~ .x < 0.05
        )
      ) |> 
      filter(
        ID %in% unlist(l_module_genes)
      ) |> 
      pull(1) |> 
      unique()
  ) |> 
  purrr::compact()

Comparison

Modules vs. Rhythmic genes

Show code
writeLines("#####################################################
How many genes are in each of my geneset of interest?
#####################################################")
#####################################################
How many genes are in each of my geneset of interest?
#####################################################
Show code
## MAKE YOUR LIST OF GENES OF INTEREST ##

# LIST ONE - WGCNA modules
list1 <- l_module_genes
writeLines("List of interesting genes #1
----------------------------
Genes in each of the identified gene-clusters or modules")
List of interesting genes #1
----------------------------
Genes in each of the identified gene-clusters or modules
Show code
sapply(list1, length)
  C1   C2   C3   C4   C5   C6   C7 
1165  123  146  153   69   67 1622 
Show code
## LIST TWO - rhythmic genes
list2 <- l_rhy_genes
writeLines("List of interesting genes #2
----------------------------
Rhythmic genes identified by different algorithms")
List of interesting genes #2
----------------------------
Rhythmic genes identified by different algorithms
Show code
sapply(list2, length)
      ARS    empJTK GeneCycle       JTK    meta2d      RAIN 
      362       127       129        78       276       255 
Show code
## CHECK FOR OVERLAP
# define size of genome
size = length(unique(c(unlist(list1), unlist(list2))))
# make a GOM object
gom.1v2 <- GeneOverlap::newGOM(
  list2, 
  list1,
  genome.size = size
)
# png(paste0(path_to_repo, "/results/figures/",
#            "02_pogo_GCN/",
#            sample.name[1],"_gom_1v2.png"),
#     width = 35, height = 15, units = "cm", res = 300)
GeneOverlap::drawHeatmap(
  gom.1v2,
  adj.p = TRUE,
  cutoff=0.05,
  what="odds.ratio",
  # what="Jaccard",
  log.scale = T,
  note.col = "black",
  grid.col = "Oranges"
)

Show code
# trash <- dev.off()

 # writeLines("How many genes exactly are overlapping between the pairwise comparisons")
# getMatrix(gom.1v4, name = "intersection") %>% t()

writeLines("Visualizing the significant overlaps between your lists of interesting genes and the identified modules")
Visualizing the significant overlaps between your lists of interesting genes and the identified modules

Network statistics

From WGCNA-tutorial

Intramodular connectivity

“We begin by calculating the intramodular connectivity for each gene. (In network literature, connectivity is often referred to as”degree”.) The function intramodularConnectivity computes:

  • the whole network connectivity kTotal,
  • the within (intra)module connectivity kWithin,
  • the extra-modular connectivity kOut=kTotal-kWithin, and
  • the difference of the intra- and extra-modular connectivities kDiff = kIn - kOut = 2*kIN-kTotal
Show code
# From what I can tell, colorh1 in the tutorial refers to moduleColors
colorh1 <- merge$colors

# Calculate the connectivities of the genes
Alldegrees1 = WGCNA::intramodularConnectivity(
  adjMat = adj_matrix, 
  colors = colorh1
) |> 
  tibble::rownames_to_column("gene_name") |>  
    mutate(
      across(
        matches("^k"),
        ~ round(.x, 2)
      )
    )

Calculate the signed kME and display the first few rows/columns.

Show code
datKME = WGCNA::signedKME(
  datExpr, 
  merge[["modules"]]$newMEs, 
  outputColumnName = ""
)
# # Display the first few rows of the data frame
# datKME[1:6,1:6]

Plotting the mean (± 95% CI) connectivity of genes in different modules

Show code
pd <- position_dodge(0.1)

# which_var <- "kTotal"
which_var <- c("kTotal", "kWithin", "kOut", "kDiff")

Alldegrees1 |>  
  # rownames_to_column("gene_name") %>% 
  left_join(
    module_genes, 
    join_by(gene_name)
  ) |> 
  # PLOT FROM RAW DATA
  mutate(
    module_identity = factor(
      module_identity, 
      levels = sort(unique(module_genes$module_identity)) |> rev()
    )
  ) |> 
  summarySE(
    measurevar = which_var, 
    groupvars = "module_identity"
  ) |> 
  mutate(
    type = factor(
      type,
      levels = which_var
    )
  ) |> 
  # Plot
  ggplot(
    aes(
      y = module_identity, 
      x = mean,
      group = interaction(module_identity, type)
    )
  ) +
  geom_vline(xintercept = 0, col = "maroon", alpha = 0.7) +
  labs(
    y = "",
    x = glue::glue(
      "Connectivity"
    ),
    title = ""
  ) +
  ## Add error bar here
  geom_errorbar(
    aes(xmin = mean-ci, xmax = mean+ci),
    width = 0.3, 
    position=pd, 
    lwd = 1.3,
    col="black", 
    alpha = 1
  ) +
  # Add the points on top of the error bars
  geom_point(
    position = pd, 
    size = 3,
    col = "black", 
    fill = "orange",
    show.legend = F, 
    pch = 21,
    alpha = 0.9
  ) +
  facet_wrap(
    ~ type,
    nrow = 2
  ) +
  theme_bw(25) +
  # scale_color_manual(values=c("#F20505","#F5D736","#0FBF67")) +
  theme(
    legend.position = "none"
  )